153 research outputs found
Efficient pebbling for list traversal synopses
We show how to support efficient back traversal in a unidirectional list,
using small memory and with essentially no slowdown in forward steps. Using
memory for a list of size , the 'th back-step from the
farthest point reached so far takes time in the worst case, while
the overhead per forward step is at most for arbitrary small
constant . An arbitrary sequence of forward and back steps is
allowed. A full trade-off between memory usage and time per back-step is
presented: vs. and vice versa. Our algorithms are based on a
novel pebbling technique which moves pebbles on a virtual binary, or -ary,
tree that can only be traversed in a pre-order fashion. The compact data
structures used by the pebbling algorithms, called list traversal synopses,
extend to general directed graphs, and have other interesting applications,
including memory efficient hash-chain implementation. Perhaps the most
surprising application is in showing that for any program, arbitrary rollback
steps can be efficiently supported with small overhead in memory, and marginal
overhead in its ordinary execution. More concretely: Let be a program that
runs for at most steps, using memory of size . Then, at the cost of
recording the input used by the program, and increasing the memory by a factor
of to , the program can be extended to support an
arbitrary sequence of forward execution and rollback steps: the 'th rollback
step takes time in the worst case, while forward steps take O(1)
time in the worst case, and amortized time per step.Comment: 27 page
Efficient Bundle Sorting
AMS subject classification. 68W01
DOI. 10.1137/S0097539704446554Many data sets to be sorted consist of a limited number of distinct keys. Sorting such data sets can be thought of as bundling together identical keys and having the bundles placed in order; we therefore denote this as bundle sorting. We describe an efficient algorithm for bundle sorting
in external memory, which requires at most c(N/B) logM/B k disk accesses, where N is the number
of keys, M is the size of internal memory, k is the number of distinct keys, B is the transfer block
size, and 2 < c < 4. For moderately sized k, this bound circumvents the Ī((N/B) logM/B(N/B))
I/O lower bound known for general sorting. We show that our algorithm is optimal by proving a
matching lower bound for bundle sorting. The improved running time of bundle sorting over general
sorting can be significant in practice, as demonstrated by experimentation. An important feature of the new algorithm is that it is executed āin-place,ā requiring no additional disk space
Efficient Bundle Sorting
This is the published version. Copyright Ā© 2006 Society for Industrial and Applied MathematicsMany data sets to be sorted consist of a limited number of distinct keys. Sorting such data sets can be thought of as bundling together identical keys and having the bundles placed in order; we therefore denote this as bundle sorting. We describe an efficient algorithm for bundle sorting in external memory, which requires at most c(N/B) logM/Bk disk accesses, where N is the number of keys, M is the size of internal memory, k is the number of distinct keys, B is the transfer block size, and 2 < c < 4. For moderately sized k, this bound circumvents the Theta((N/B) logM/B (N/B)) I/O lower bound known for general sorting. We show that our algorithm is optimal by proving a matching lower bound for bundle sorting. The improved running time of bundle sorting over general sorting can be significant in practice, as demonstrated by experimentation. An important feature of the new algorithm is that it is executed "in-place," requiring no additional disk space
Approximate Data Structures with Applications
In this paper we introduce the notion of approximate
data structures, in which a small amount of error is
tolerated in the output. Approximate data structures
trade error of approximation for faster operation, leading to theoretical and practical speedups for a wide variety of algorithms. We give approximate variants of the van Emde Boas data structure, which support the same dynamic operations as the standard van Emde Boas data structure [28, 201, except that answers to queries are approximate. The variants support all operations in constant time provided the error of approximation is l/polylog(n), and in O(loglog n) time provided the error
is l/polynomial(n), for n elements in the data structure.
We consider the tolerance of prototypical algorithms to approximate data structures. We study in particular Primās minimumspanning tree algorithm, Dijkstraās single-source shortest paths algorithm, and an on-line variant of Grahamās convex hull algorithm. To obtain output which approximates the desired output
with the error of approximation tending to zero, Primās algorithm requires only linear time, Dijkstraās algorithm requires O(mloglogn) time, and the on-line variant of Grahamās algorithm requires constant amortized time per operation
Dynamic Generation of Discrete Random Variates
The original publication is available at www.springerlink.comWe present and analyze efficient new algorithms for generating a random variate distributed according
to a dynamically changing set of N weights. The base version of each algorithm generates the
discrete random variate in O(log N) expected time and updates a weight in O(2log N) expected
time in the worst case. We then show how to reduce the update time to O(log N) amortized
expected time. We nally show how to apply our techniques to a lookup-table technique in order
to obtain expected constant time in the worst case for generation and update. We give parallel
algorithms for parallel generation and update having optimal processor-time product.
Besides the usual application in computer simulation, our method can be used to perform
constant-time prediction in prefetching applications. We also apply our techniques to obtain an
eĆcient dynamic algorithm for maintaining an approximate heap of N elements, in which each query
is required to return an element whose value is within an multiplicative factor of the maximal
element value. For = 1=polylog(N), each query, insertion, or deletion takes O(log log logN) time
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors
This paper considers issues of memory performance in shared memory multiprocessors that provide a high-bandwidth network and in which the memory banks are slower than the processors. We are concerned with the effects of memory bank contention, memory bank delay, and the bank expansion factor (the ratio of number of banks to number of processors) on performance, particularly for irregular memory access patterns. This work was motivated by observed discrepancies between predicted and actual performance in a number of irregular algorithms implemented for the cray C90 when the memory contention at a particular location is high. We develop a formal framework for studying memory bank contention and delay, and show several results, both experimental and theoretical. We first show experimentally that our framework is a good predictor of performance on the cray C90 and J90, providing a good accounting of bank contention and delay. Second, we show that it often improves performance to have addi..
- ā¦